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i 1 Abstract 

H. 

, We discuss two new methods of recovery of sparse signals from noisy observation based on l\- 

minimization. While they are closely related to the well-known techniques such as Lasso and Dantzig 
Selector, these estimators come with efficiently verifiable guaranties of performance. By optimizing these 
bounds with respect to the method parameters we are able to construct the estimators which possess 
better statistical properties than the commonly used ones. 

We link our performance estimations to the well known results of Compressive Sensing and justify our 
proposed approach with an oracle inequality which links the properties of the recovery algorithms and 
the best estimation performance when the signal support is known. We also show how the estimates can 
be computed using the Non-Euclidean Basis Pursuit algorithm. 
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1 Introduction 

^ . Recently several methods of estimation and selection which refer to the ^-minimization received much 
attention in the statistical literature. For instance, Lasso estimator, which is the -^-penalized least-squares 
method is probably the most studied (a theoretical analysis of the Lasso estimator is provided in, e.g., 
[2, 3, 4, 19, 20, 21, 17, 18], see also the references cited therein). Another, closely related to the Lasso, 
statistical estimator is the Dantzig Selector [7, 2, 16, 17]. To be more precise, let us consider the estimation 
problem as follows. Assume that an observation 

y = Ax + a£ e R m (1) 

is available, where x £ R n is an unknown signal and A £ R mxn j s a known sensing matrix. We suppose that 
cr£ is a Gaussian disturbance with £ ~ N(0,I m ) (i.e., £ = (£i, £ n ) T , where & are independent normal 
r.v.'s with zero mean and unit variance), and a > is a known deterministic noise level. Our focus is on 
the recovery of unknown signal x. 



*Research of the second author was supported by the Office of Naval Research grant # N000140811104 and the NSF grant 
DMS-0914785. 
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The Dantzig Selector estimator xds of the signal x is defined as follows [7] : 

XDs{y) G Argmin{||t>||i | ||A T (Au - y)\\oo < p} 

DgR" 

where p = O ^<7\/ln nj is the algorithm's parameter. Since xds is obtained as a solution of an linear program, 
it is very attractive by its low computational cost. Accuracy bounds for this estimator are readily available. 
For instance, a well known result about this estimator (cf. [7, Theorem 1.1]) is that if p = O ^<7^/ln(n/e)^ 
then 

\\xDs(y) - x\\ 2 < Ka^slogine- 1 ) 

with probability 1 — e if a) the signal x is s-sparse, i.e. has at most s non-vanishing components, and b) the 
sensing matrix A with unit columns possesses the Restricted Isometry Property RTP(£, k) with parameters 
< 5 < Y^7y5 ana - ^ > 3s. 1 Further, in this case one has K = C(l — 5) _1 , where C is a moderate absolute 
constant. This result is quite impressive, in part due to the fact (see, e.g. [5, 6]) that there exist m x n 
random matrices, with m < n, which possess the RIP with probability close to 1, 5 close to zero and the 
value of k as large as O (mln _1 (n/m)). Similar performance guarantees are known for Lasso recovery 

a?iasso(y) G Argmin{||f||i + x\\Av - y\\\} , 

with properly chosen penalty parameter x. The available accuracy bounds for Lasso and Dantzig Selec- 
tor rely upon the Restricted Isometry Property or less restrictive assumptions about the sensing matrix, 
such as Restricted Eigenvalue [2] or Compatibility [3] conditions (a complete overview of those and several 
other assumptions with description of how they relate to each other is provided in [19]). However, these 
assumptions cannot be verified efficiently. The latter implies that there is currently no way to provide any 
guaranties (e.g., confidence sets) of the performance of the proposed procedures. A notable exception from 
this rule is the Mutual Incoherence assumption (see, e.g. [10, 11, 12] and [21] for the case of, respectively, 
deterministic and random observation noise) which can be used to compute the accuracy bounds for recovery 
algorithms: a matrix A with columns of unit ^-norm and mutual incoherence p(A) possesses RIP(<5, k) with 
5 = (m — l)p,(A). 2 Unfortunately, the latter relation implies that p(A) should be very small to certify the 
possibility of accurate ^i-recovery of non-trivial sparse signals, so that performance guarantees based on 
mutual incoherence are very conservative. This "theoretical observation" is supported by numerical exper- 
iments - the practical guarantees which may be obtained using the mutual incoherence are generally quite 
poor even for the problems with nice theoretical properties (cf. [14, 15]). 

Recently the authors have proposed a new approach for efficient computing of upper and lower bounds 
on the "level of goodness" of a sensing matrix A, i.e. the maximal s such that the £i-recovery of all signals 
with no more than s non- vanishing components is accurate in the case where the measurement noise vanishes 
(see [14]). In the present paper we aim to use the related verifiable sufficient conditions of "goodness" of 
a sensing matrix A to provide efficiently computable bounds for the error of i\ recovery procedures in the 
case when the observations are affected by random noise. 

The main body of the paper is organized as follows: 

1 Recall that RIP(<5, k), called also uniform uncertainty principle, means that for any v £ R n with at most k nonzero entries, 

(l-*)Hla < \\Av\\l < (1 + <5)|M| 2 

This property essentially requires that every set of columns of A with cardinality less than k approximately behaves like an 
orthonormal system. 

2 The mutual incoherence (J.(A) of a sensing matrix A = [Ai, A n ] is computed according to 

u(A) = max ^l/H - 
W AfAi 

Obviously, the mutual incoherence can be easily computed even for large matrices. 
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1. We start with Section 2.1 where we formulate the sparse recovery problem and introduce our core 
assumption - a verifiable condition H S]00 (k) linking matrix A € YL mxn and a contrast matrix H £ 
R mxn . In Sections 2.2, 2.3 we present two recovery routines with contrast matrices: 

• regular recovery: 

Xre g (y) G Argmin{|H|i : \\H T (Av - y)^ < p}, 

• penalized recovery: 

(y) £ Argmin{||t;||i + 9s\\H T (Av - y)||oo}, 

(s is our guess for the number of nonzero entries in the true signal, 9 > is the penalty parameter) 

along with their performance guarantees under condition H s oo (k) with k < 1/2, that is, explicit upper 
bounds on the confidence levels of the recovery errors \\x — x\\ p . The novelty here is that our bounds 
are of the form 

Prob | \\x — x\\ p < O ^s 1 ^ p o"y / ln(n/e)^ for every s-sparse signal x and all 1 < p < oo| > 1 — e (2) 

(with hidden factors in O(-) independent of e, a), and are valid in the entire range 1 < p < oo of values 
of p. Note that similar error bounds for Dantzig Selector and Lasso are only known for 1 < p < 2, 
whatever be the assumptions on "essentially nonsquare" matrix A. 

2. Our interest in condition H SjQO (k) stems from the fact that this condition, in contrast to the majority 
of the known sufficient conditions for the validity of i\ -based sparse recovery (e.g., Restricted Isome- 
try/Eigenvalue/Compatibility), is efficiently verifiable. Moreover, it turns out that one can efficiently 
optimize the error bounds of the associated with this verifiable condition regular/penalized recovery 
routines over the contrast matrix H. The related issues are considered in Section 3. In Section 4 we 
provide some additional justification of the condition H, in particular, by linking it with the Mutual In- 
coherence and Restricted Isometry properties. This, in particular, implies that the condition H St00 (n) 
with, say, k = ^ associated with randomly selected m x n matrices A is feasible, with probability 
approaching 1 as m,n grow, for s as large as 0(y / m/ln(n)). We also establish limits of performance 
of the condition, specifically, show that unless A is nearly square, H Sj00 (k) with k < 1/2 can be fea- 
sible only when s < 0(1) y/m, meaning that the tractability of the condition has a heavy price: when 
designing and validating l\ minimization based sparse recovery routines, this condition can be useful 
only in a severely restricted range of the sparsity parameter s. 

3. In Section 5 we show that the condition H SiOC (k) is the strongest (and seemingly the only verifiable 
one) in a natural family of conditions H Si9 (k) linking a sensing and a contrast matrix; here s is the 
number of nonzeros in the sparse signal to be recovered q £ [l,oo]. We demonstrate that when a 
contrast matrix H satisfies H s>(J (k) with k < 1/2, the associated regular and penalized l\ recoveries 
admit error bounds similar to (2), but now in the restricted range 1 < p < q of values of p. We 
demonstrate also that feasibility of H Sj(? (k) with k < 1/2 implies instructive (although slightly worse 
than those in (2)) error bounds for the Dantzig Selector and Lasso recovering routines. 

4. In Section 6, we present numerical results on comparison of regular/penalized £± recovery with the 
Dantzig Selector and Lasso algorithms. The conclusion suggested by these preliminary numerical 
results is that when the former procedures are applicable (i.e., when the techniques of Section 3 allow 
to build a "not too large" contrast matrix satisfying the condition H s oo (k) with, say, k = 1/3), 
our procedures outperform signiheantly the Dantzig Selector and work exactly as well as the Lasso 
algorithm with "ideal" (unrealistic in actual applications) choice of the regularization parameter 5 . 

3 With "theoretically optimal," rather than "ideal," choice of the regularization parameter in Lasso, this algorithm is essen- 
tially worse than our algorithms utilizing the contrast matrix. 



3 



5. In the concluding Section 7 we present a "Non-Euclidean Matching Pursuit algorithm" (similar to 
the one presented in [15]) with the same performance characteristics as those of regular/penalized 
l\ recoveries; this algorithm, however, does not require optimization and can be considered as a 
computationally cheap alternative to l\ recoveries, especially in the case when one needs to process a 
series of recovery problems with common sensing matrix. 

All proofs are placed in the Appendix. 

2 Accuracy bounds for l\ -Recovery Routines 
2.1 Problem statement 

Notation. For a vector x G R n and 1 < s < n we denote x s the vector obtained from x by setting to 
all but the s largest in magnitude entries of x. Ties, if any, could be resolved arbitrarily; for the sake 
of definiteness assume that among entries of equal magnitudes, those with smaller indexes have priority 
(e.g., with x = [1;2;2;3] one has x 2 = [0;2;0;3]). ||x|| S)P stands for the usual ^ p -norm of x s (so that 
IMIs.oo = 1 1^1 |oo)- We say that a vector z is s-sparse if it has at most s nonzero entries. Finally, for a set 
/ C {1, n} we denote by J its complement {1, n}\I; given x G R n , we denote by xi the vector obtained 
from x by zeroing the entries with indices outside of I, so that x = xi + xj. 

Given a norm z/(-) on R m and a matrix H = [hi, h^f] G H mxN , we set v(H) = maxi/(/ij). 

The problem. We consider an observation y G R m 

y = Ax + u + cr£, (3) 

where x G R n is an unknown signal and A G R mxn is the sensing matrix. We suppose that at; is a Gaussian 
disturbance, where £ ~ N(0,I m ) (i.e., £ = (£i, £ n ) T with independent normal random variables £j with 
zero mean and unit variance), a > being known, and u is a nuisance parameter known to belong to a given 
uncertainty set U C R m which we will suppose to be convex, compact and symmetric w.r.t. the origin. Our 
goal is to recover x from y, provided that x is "nearly s-sparse." Specifically, we consider the sets 

X(s,v) = {x G K n : \\x - x 3 ^ < v} 

of signals which admit s-sparse approximation of || • ||i-accuracy v. Given p, 1 < p < oo, and a confidence 
level 1 — e, e G (0, 1), we quantify a recovery routine — a Borel function R m 9yi-> x(y) G R n — by its 
worst-case, over x G X(s,v), confidence interval, taken w.r.t. || • || p -norm of the error. Specifically, we define 
the risks of a recovery routine as 

Risk p (x(-)|e, a, s, v) = inf {5 : Prob{£ : 3x G X(s, v), u G U : \\x(Ax + cx£ + u) — x\\ p > 5} < e} . 

Equivalently: Risk p (s(-)|e, a, s, v) < S if and only if there exists a set H of "good" realizations of £ with 
Prob{£ G H} > 1 — e such that whenever £ G S, one has + <r£ + n) — x|| p < <5 for all x G X(s, v) and 

all u £U. 

Norm v{-). Given e and a > let us denote 

v(v) = v t ,a,u(v) = sup u T v + (T \/ 2 ln(n/e) ||u|| 2 . (4) 

Since is convex, closed and symmetric with respect to the origin, v{-) is a norm. Let v* be the norm on 
R n conjugate to v: 

v*{u) = max{« T u : v(v) < 1}. 
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Conditions H(7) and H soo (k). Let 7 = (71,..., 7 n ) G R+- Given A G R mxn , consider the following 
condition on a matrix H = [hi, h n ] G R mxn : 

H(7): /or all x G R n and 1 < i < n one /ias 

I a;» I < \hfAx\ + 7t||x||i. (5) 

Now let s be a positive integer and k > 0. Given ^4 G R mxn , we say that a matrix iJ = [hi, /i n ] G R mxn 
satisfies condition H s .oo(k) 4 , if 

Vi £ R n : Halloo < W^AxWao + (6) 
The conditions we have introduced are closely related to each other: 

Lemma 1 If H satisfies H(7), then H satisfies Hs,oo( s ||7||oo)j o,nd "nearly vice versa:" given H G R mxn 
satisfying H SjOC (k), one can build efficiently a matrix H' G R' mxn satisfying H(7) with 7 = ^[1; 1] (i.e., 
K = s||7||ooJ and such that the columns of H' are convex combinations of the columns of H and —H, so that 
V {H') < v(H) for every norm u(-) on R m . 

2.2 Regular £1 Recovery 

In this section we discuss the properties of the regular ^-recovery x reg given by: 

x rC g = x rcg (y) G Argmin{||f ||i : \hf(Av — y)\ < pi, i = l,...,n}, (7) 

where y is as in (3), hi, i = 1, n, are some vectors in R m and pi > 0, i = 1, n. We refer to the matrix 
H = [hi,..., h n ] as to the contrast matrix underlying the recovering procedure. 
The starting point of our developments is the following 

Proposition 1 Given an m x n sensing matrix A, noise intensity a, uncertainty set IA and a tolerance 
e G (0, 1), let the matrix H = [hi, h n ] from (7) satisfy the condition H(7) for some 7 G R™ , and let pi in 
(7) satisfy the relation 

Pi>Vi\=v(hi),i = l,...,n (8) 

where v{-) is given by (4). Then there exists a set H C R m , Prob{£ G H} > 1 — e, of "good" realizations of 
£ such that 

(i) Whenever £ G H, for every x G R n , every u&U and every subset I C {1, ...,n} such that 

7/:=!> < i ( 9 ) 

i€l 

the regular ii-recovery x reg given by (7) satisfies: 

n I,- /, . t . n 11 ^ 2||xj||i + 2p 7 + 2i/ 7 

(a) ||:r reg (Az; + cr£ + it) - x||i < ; 

1 - 27/ 

(6) I [x rcg (Ax + a£ + u) - x] i \ < pi + Vi + 7ip reg (y) - (10) 

2||xj||i + 2/9/ + 2z^/ 
1 - 27/ 

where pi = Pi and vi = v i- 

(ii) In particular, when setting 



Ps = \\[pi; Pn]\\s,i, v s = \\[v{hi);...;v(h n )]\\ s ^i, % = || [-yi ; 7n]|| s> l> 
p = pi = maxj pi, v(H) = v\ = max; v(hi), 7 = 71 = maxj 7,, 



(11) 



The reason for this cumbersome, at the first glance, notation will become clear later, in Section 5. 
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and assuming 7, < \, for every x E R n , £ E S and u 6W it holds 

\\x Te JAx + at + u) — xh < 2 <2 h 2s 

II +*+«>-. I- < 2 Jl^ + ll±M-2W±^m 

1 - 27 s 1 - 27 s 

(mj Finally, assuming S7 < 1/2, /or every £ E H, x E R n and u £U one has 

\\x ies (Ax + at + u) - xh < 2^0+2s^f; 
||x re g(Arc + o~£ + u) — xWqq < s 1 "'i-2s : f 1 l—2s^' 



(12) 



The following result is an immediate corollary of Proposition 1: 

Lemma 2 Under the premise of Proposition 1, assume that % < |. XTien /or a// 1 < p < 00 and i; > 0: 

Risk p (£ reg (-)|e, a, s, v) < [v + p s + Z? 8 ] * [71; + [I - %] [p + v{H)\ + 7^ + %]} ^ (13) 

J- ^7s 

(for notation, see (11)). Further, if 57 < 1/2, we have also 

1 

l<p<oo^ Risk p (x rcg (-)|e,^s,t;) < /^^ (a" 1 ^ + P + v{H)). (14) 

1 — 2s7 

The next statement is similar to the cases of k := sj < 1/2 in Proposition 1 and Lemma 2; the difference is 
that now we assume that H satisfies H s oo (k), which, by Lemma 1, is a weaker requirement on H than to 
satisfy £[(7) with 57 = s||'y||oo = K. 

Proposition 2 Given an m x n sensing matrix A, noise intensity a, uncertainty set IA and a tolerance 
e E (0, 1), let the matrix H = [hi, h n ] from (7) satisfy the condition H SjQO (k) for some K < 1/2, and let pi 
in (7) satisfy the relation (8). Then there exists a set S C R' m , Prob{£ E H} > 1 — e, of "good" realizations 
of £ such that whenever £ E H, for every x E R n and every u €U one has 



\\x ies (Ax + a£ + u) - x\\! < 2^g^ + 2s^p-- 

In particular, 



\ x reg{Ax + 0~£ + u) 3; 1 1 00 — s ' 1—2/1 ^ 1— 2k ' ' 



(15) 



(2s)p 

l<p<oo=> Risk p (x re g(-)|e,o-,s,^) < (s^v + p + v(H )). (16) 

2.3 Penalized £i Recovery 

Now consider the penalized l\-recovery x pen as follows: 

Xpen(y) E Argmin{||i;||i + 0s||if T (,4u - y)||oo}, (17) 

where y is as in (3), and an integer s < n, a positive 0, and a matrix H are parameters of the construction. 

Proposition 3 Given an m x n sensing matrix A, an integer s <n, a matrix H = [h\, h n ] E R mxn and 
positive reals ji, 1 < i < n, satisfying the condition H(7), and a 6 > 0, assume that 

% := IMki < § (18) 
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and 

(l-7.)- 1 <0<(7.)- 1 (19) 

Further, let a > 0, e G (0, 1), and /ei 

= Ve,a,u(hi), i = l,...,n, z/(iJ) = maxi/j. (20) 

i 

Consider the penalized recovery x pe n(0 associated with H,s,8. There exists a set S C R m , Prob{£ 6 S} > 
1 — e, o/ "good" realizations of £ suc/i £/iai 

(i) Whenever £ G E, /or every signal x G R" and every u €U one has 

(a) Ppen^x + o-e + n) -x||i < iJ^fcpff%] 

(6) Ppen(Az + cr£ + it)) - x||oo < + 7) ||xpen(Ac + cr£ + it) - x\\i + 2f(il) 

- 2 {j8 + ^) min[e(l-7 s : )-"l,l-07 s ] + 2 K#) SEI^T^t^T^tJ + 1 ' 

(21) 

where, as in Lemma 2, 7 = max 7^ . 

i 

(ii) When 9 = 2 and 7 < one /ias /or every x G R™, u £14 and £ G S: 

(a) Ppe^x + ^ + ^-xIl! < 2^0 + 4*^ 
(6) Ppe^x + o-e + ii^-xlU < 2 S - 1 ^|i + 4 1 ^gl, 

whence for every v > and 1 < p < 00 ; 
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Risk p (x pen (-)|e,o-,s,i;) < - 2S " ^ (s^f + 2i/(ff )). (23) 

1 — ZS7 

The next statement is in the same relation to Proposition 3 as Proposition 2 is to Proposition 1 and Lemma 
2. 

Proposition 4 Given an m x n sensing matrix A, noise intensity a, uncertainty set IA and a tolerance 
e G (0, 1) ; let the matrix H = [h\, h n ] from (17) satisfy the condition H Si00 (k) for some k < 1/2, and let 
9 = 2. Then there exists a set S C R m , Prob{£ G H} > 1 — e, of "good" realizations of £ such that whenever 
£ G H, /or every x G R n and every u €U one has 

^(Ax + a^ + ^-xh < 2 Jl2=gi + 4 S ^ (24) 
||x pen (A3; + a£ + u) - xWoo < 2s~ 1 ^^ 1 + 4 T=^- 

in particular, 

1 

1 < p < 00 Riskp(x peil (-)|e, <7,a,t;) < --^-(s^i; + 2t/(il)). (25) 

Note that under the premise of Proposition 2, the smallest possible values of pi are the quantities which 
results in p = v{H); with this choice of pi, the risk bound for the regular recovery, as given by the right 
hand side in (16), coincides within factor 2 with the risk bound for the penalized recovery with 9 = 2 as 
given by (25); both bounds assume that H satisfies H Sj00 (k;) with k < 1/2 and imply that 
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1 < p < 00 => Risk p (x(-)\e,a,s,v) < 2—^— (s~ l v + 2v{H)). (26) 

1 — 2k 

When v = 0, the latter bound admits a quite transparent interpretation: everything is as if we were observing 
the sum of an unknown s-dimensional signal and an observation error of the uniform norm 0{l)v(H). 
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3 Efficient construction of the contrast matrix H 



In what follows, we fix A, the "environment parameters" e, a, U and the "level of sparsity" s of signals x 
we intend to recover, and are interested in building the contrast matrix H = [hi, h n ] resulting in as small 
as possible error bound (26). All we need to this end is to answer the following question (where we should 
specify the norm </?(•) as z^,o-,w( - )) : 

(?) Let ip(-) be a norm on R m , and s be a positive integer. What is the domain G s of pairs 
(oj, k) G R+ such that k < 1/2 and there exists matrix H = [hi, h n ] G R, mxn satisfying the 
condition H S)OC (k) and the relation <p(H) := max <p(hi) < oj? How to find such an H, provided 

i 

it exists? 

Invoking Lemma 1, we can reformulate this question as follows: 

(??) Let </?(•) and s be as in (?). Given (oj,k) G R^_, how to find vectors hi G R m , 1 < i < n, 
satisfying 

(a): tp{hi) < oj; & (6) : \x { \ < \hf Ax\ + s _1 k||x||i Vx G R n (Pi) 
for every i, or to detect correctly that no such collection of vectors exists? 

Indeed, by Lemma 1, if H' satisfies H SjDO (k) and <p(H') < u, then there exists H = [hi, h n ] such that hi 
satisfy (Pi.b) for all i and <p(H) < <p(H') < uj, so that hi satisfy (Pi.a) for all i as well. Vice versa, if hi 
satisfy (Pi), 1 < i < n, then the matrix H = [hi, h n ] clearly satisfies H s oo (k), and f(H) < oj. 
The answer to (??) is given by the following 

Lemma 3 Given k > 0, uj > 0, and a positive integer s, let 7 = k/s. For every i < n, the following three 
properties are equivalent to each other: 

(i) There exists h = hi satisfying (Pi); 

(ii) The optimal value in the optimization problem 

Optifr) = nun {<p(h) : \\A T h - e l \\ 00 < 7} (if) 

where ej is i-th standard basic orth in R n , is < oj; 

(iii) One has 

Vx G R n : \xi\ < ojip*(Ax) + 7||x||i, (27) 
where ip*(u) = max u T v is the norm on R m conjugate to <p(-). 

ip(v)<l 

Whenever one (and then - all) of these properties take place, problem (P?) is solvable, and its optimal 
solution hi satisfies (Pi). 

3.1 Optimal contrasts for regular and penalized recoveries 

As an immediate consequence of Lemma 3, we get the following description of the domain G s associated 
with the norm (/?(•) = z^ i(J ,w(") : 

(a) G s = {(uj, k) > : s~ 1 k > 7*, oj > w*(s _1 k)} , 
where 

(b) 7* = max min \\A T h — ej||oo = maxmax {xi : ||x||i < I, Ax = 0} , (28) 

l<i<n h i x 

(c) a;* (7) = max 0ptj(7) 

l<i<n 
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where cft(-) in (P^) is specified as v e ,cjjj{')- Note that the second equality in (b) is given by Linear Program- 
ming duality. Indeed, by (b), 7* is the smallest 7 for which all problems (P?), i = 1, ...,n, are feasible, and 
thus, by Lemma 3, (7, k) G G s if and only if k/s > 7* and w > uj*{k/ s). 

Note that the quantity 7* depends solely of A, while depends on e,a,U, as on parameters, but is 
independent of s. 

The outlined results suggest the following scheme of building the contrast matrix H: 

• we compute 7* by solving n Linear Programming problems in (28.6); if 57* > h, then G s does not 
contain points (oj,k) with k < 1/2, so that our recovery routines are not applicable (or, at least, we 
cannot justify them theoretically); 

• when S7* < |, the set G s is nonempty, and its Pareto frontier (the set of pairs (u,k) G R/L such that 
(oj, k) > (uj',k) G G s is possible if and only if uj' = uj) is the curve (w* (7) , 57) , 7* < 7 < We 
choose a "working point" on this curve, that is, a point 7 G [7*, ^] and compute w*(7) by solving the 
convex optimization programs (P^), i = l,...,n, with </>(•) specified as v e ,a,u(')- ^*(t) is nothing but 
the maximum, over i, of the optimal values of these problems, and the optimal solutions hi to the 
problems induce the matrix H = H{^) = [hi, h n ] which satisfies H Sj00 (s7) and has v{H) < w*(7). 
By reasoning which led us to (??), 

v(H{^)) = a;* (7) = rmn \y(H') : H' satisfies H Sj00 (s7)} , 

that is, H = H{^f) is the best for our purposes contrast matrices satisfying H Sj00 (s7). With this 
contrast matrix, the error bound (26) for regular/penalized li recoveries (in the former, pi = y%{hi), 
in the latter, 9 = 2) read 

1. 

1 < p < 00 mskJx(-)\e,a,s,v) < 2-^—(s~ 1 v + 2oj^)). (29) 

1 — 2^7 

The outlined strategy does not explain how to choose 7. This issue could be resolved, e.g., as follows. We 
choose an upper bound on the sensitivity of the risk (29) to v, i.e., to the || • ||i-deviation of a signal to be 
recovered from the set of s-sparse signals. This sensitivity is proportional to j^s^, so that an upper bound 
on the sensitivity translates into an upper bound 7 + < on 7. We can now choose 7 by minimizing the 
remaining term in the risk bound over 7 G [7*,7 + ], which amounts to solving the optimization problem 

max{r : to;* (7) < 1 — 2s7, 7* < 7 < 7" 1 "} . 

Observing that w*(-) is, by its origin, a convex function, we can solve the resulting problem efficiently by 
bisection in r. A step of this bisection requires solving a univariate convex feasibility problem with efficiently 
computable constraint and thus is easy, at least for moderate values of n. 

4 Range of feasibility of condition H s oo (ft) 

We address the crucial question of what can be said about the magnitude of the quantity £*;*(•), see (28) 
and the risk bound (29). One way to answer it is just to compute the (efficiently computable!) quantity 
w*(7) for a desired value of 7. Yet it is natural to know theoretical upper bounds on oj* in some "reference" 
situations. Below, we provide three results of this type. 

At this point, it makes sense to express in the notation that w*(7) depends, as on parameters, on the 
sensing matrix A and the "environment parameters" e,a,lA, so that in this section we write uj^^A, e, a, IA) 
instead of ^(7). 
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4.1 Bounding via mutual incoherence 

Recall that for an m x n sensing matrix A = [Ai,...,A n ] with no zero columns, its mutual incoherence is 
defined as 

\A T A-\ 

u(A) = max ' / . 

l<i#<n Af A; 

Compressed Sensing literature contains numerous mutual-incoherence-related results (see, e.g., [10, 11, 12] 
and references therein). To the best of our knowledge, all these results state that if s is a positive integer 
and A is a sensing matrix such that j^rro~i < 5> then i^-based sparse recovery is well suited for recovering 
s-sparse signals (e.g., recovers them exactly when there is no observation noise, admit explicit error bounds 
when there is noise and/or the signal is only nearly s-sparse, etc.). To the best of our knowledge, all these 
results, up to the values of absolute constant factors in error bounds, are covered by the risk bounds (29) 
combined with the following immediate observation: 

Observation 1 Whenever A = [A%, A n ] is an mx n matrix with no zero columns and s is a positive in- 
teger, the matrix H(A) = [Ai/A^A\, A2/A2A2, A n /A^A n ] satisfies the condition H Sj00 ^ ^{A)+i 

Verification is immediate: the diagonal entries in the matrix Z = I — H T A are equal to 7 := 1 — 
J(A)+i ' wrme the magnitudes of the off-diagonal entries in Z do not exceed 7. Therefore 

x G R" =>■ 7||x||i > yZxHoo = \\x — i/ T Ax||oo > \\x\\oo — \\H T Ax\\oo 44> ||#||oo < ||-f^ T ^||oo + 7IMI1 
<^ H satisfies H Sj00 (s7). 

-1 



Observe that the Euclidean norms of the columns in H {A) do not exceed 



min II A 



i 2 



whence v(H(A)) < 



r(U) + V . i M. 1 1 , where r(U) = max Hulk. In the notation from Section 3, our observations can be sum- 



r- v /21n(n/e) 

marized as follows: 

Corollary 1 For every mxn matrix A with no zero columns, one has 7* < 7 := fjft+i a,ndu]*(~f\A,e,a,U) < 



v{H{A)) < r(U) + ^'ff . In particular, 



u(A) + l , 1.. ... ^ aJ2\n(n/e) 

o/x os mnij || A||2 

It should be added that as m,n grow in such a way that ln(n) < 0(l)lnm, realizations A of "typical" 
random mxn matrices (e.g., those with independent A/"(0, 1/m) entries or with independent entries taking 
values ±l/^/m) with overwhelming probability satisfy /i(A) < 0(l)^/ln(n)/m and ||Aj||2 < 0.9 for all i. By 
Corollary 1, it follows that for these A the condition H s>00 (k) with, say, re = 1/3 can be satisfied for s as large 
as 0(l)^m/\n(n) merely by the choice H = H(A), which ensures that v{H) < 0(l)[r(U) + u-y/2 ln(n/e)]; 
in particular, in the indicated range of values of s one has < 0(l)[r(U) + ay/2\n{n/e)]. 

4.2 The case of A satisfying the Restricted Isometry Property 

Proposition 5 Let A satisfy RIP(<5, k) with some 5 G (0, 1) and with k > 1. Then there exists matrix H(A) 
which, for every positive integer s, satisfies the condition H SiCO (s'y(5,k)), with 

7(5, k) = , (30) 



and is such that v(H(A)) < r{U) + a^/2 ln(n/e) /yl — 5. In particular. 



1 - 5 « , 1 , ... 1 



s< — — V/c - 1 =>- |A,e,<r,W) < -== r(W) + aJ2\nhiJe) . (31) 

3y2 3s Vl - 5 L -I 
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4.3 Oracle inequality 



Here we assume that A G R mxn possesses the following property (where 5 is a positive integer and ip > 0): 

0(S,u>): For every i G {l,...,n} and every 5-element subset I 3 i of {l,...,n} there exists a 
routine IZij for recovering Xj from a noisy observation 

y = Ax + u + ae, [e ~ A/"(0, I m ),u G U] 

of unknown signal x G R n , known to be supported on / such that for every such signal and every 
u one has 

Pvob{\TZij(Ax + u + ae) — xi\ > uj} < e. 

We intend to demonstrate that in this situation for all s in certain range (which extends as 5 grows and 
uj decreases) the uniform error of the regular and the penalized recoveries associated with properly selected 
contrast matrix is, with probability > 1 — e, "close" to uj. The precise statement is as follows: 

Proposition 6 Given A and the "environment parameters" e < 1/16, a, li, assume that A satisfies the 
condition 0(5, 7) with certain 5,7. Then for every integer s from the range 



aJ2S\n(l e) , , 

l< s< -¥ „ , ' ' (32) 

Auj\\A\\ 

(here || • || is the standard matrix norm, the largest singular value) there exists a contrast matrix H satisfying 
the condition H S|0O (i) and such that v(H) < 2yjl + ln(n)/ ln(l/e)o;, so that in the outlined range of values of 
H one has uj*(t-) < 2-y/l + ln(n)/ ln(l/e)(j, and the associated with H error bound (29) for regular /penalized 
i\ recovery is 



Risk p (x(-)|e, a, s, v) < 16s? 



/ Inn v 
"V 1+ Hl/ej + 4s- 



(33) 



Proposition 6 justifies to some extent, our approach; it says that if there exists a routine which recovers 
5-sparse signals with a priori known sparsity pattern within certain accuracy (measured component-wise), 
then our recovering routines exhibit "close" performance without any knowledge of the sparsity pattern, 
albeit in a smaller range of values of the sparsity parameter. 



4.4 Condition H s oo (k): limits of performance 

Recall that when recovering s-sparse signals, the condition H s>00 (k) helps only when k < 1/2. Unfortunately, 
with these k, the condition is feasible in a severely restricted range of values of s. Specifically, from [15, 
Proposition 5.1] and Lemma 1 it immediately follows that 

(*) If A G R mxn is not "nearly square," that is, if n > 2(2^/m + I) 2 , then the condition H s>00 (k) 
with re < 1/2 cannot be satisfied when s is "large", namely, when s > 2^/2m + 1. 

Note that from the discussion at the end of section 4.1 we know that the u O(y/m) limit of performance" of 
the condition H SjOC (-) stated in (*) is "nearly sharp:" - when s < 0(l)y / m, the condition H S)0O (|) associated 
with a typical randomly generated m x n sensing matrix A is feasible and can be satisfies with a contrast 
matrix H with quite moderate v{H). 

(*) says that unless A is nearly square, the condition H SjQO (-) can validate l\ sparse recovery only in 
a severely restricted range s < 0(y/m) of values of the sparsity parameter. This is in sharp contrast with 
unverihable sufficient conditions for "goodness" of t\ recovery, like RIP: it is well known that when m, n 
grow, realizations of "typical" random m x n matrices, like those mentioned at the end of Section 4.1, with 
overwhelming probability possess RIP(0.1, 2s) with s as large as 0{m/ ln(2n/m)). As a result, "unverifiable" 
sufficient conditions, like RIP, can justify the validity of l\ recovery routines in a much wider (and in fact - 
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the widest possible) range of values of the sparsity parameter s than the "fully computationally tractable" 
condition H S)0O (-). This being said, note that this comparison is not completely fair. Indeed, aside of its 
tractability, the condition H s oo (k) with k < 1/2 ensures the error bounds (29) in the entire range 1 < p < oo 
of values ofp, which perhaps is not the case with conditions like RIP. Specifically, consider the "no nuisance" 
case U = {0}, and let A satisfy RIP(0.1, 25) for certain S. It is well known (see, e.g., the next section) that 
in this case the Dantzig Selector recovery ensures for every s < S and every s-sparse signal x that 

Pds - x\\ P < 0(l)o- v / ln(n/e)s 1 / p , 1 < p < 2, 

with probability > 1 — e. However, we are not aware of similar bounds (under whatever conditions) for 
"large" s and p > 2. For comparison: in the case in question, for "small" s, namely, s < 0(l)y/S, we 
have w*(^) < 0(l)<Ty / ln(?i/e) (by Proposition 5), whence for regular and penalized t\ recoveries with 
appropriately chosen contrast matrix (which can be built efficiently!) one has for all s-sparse x 

^ i — 

\\% — x\\ p < 0(l)cry ln(n/e)sp \/p G [1, oo] 

with probability > 1 — e (see (29)). We wonder whether a similar (perhaps, with extra logarithmic factors) 
bound can be obtained for large s (e.g., s > m^ +s ) for a whatever t\ recovery routine and a whatever 
essentially nonsquare (say, m < n/2) m x n sensing matrix A with columns of Euclidean length < 0(1). 

5 Extensions 

We are about to demonstrate that the pivot element of the preceding sections — the condition H Si00 (k) 
- is the strongest (and seemingly the only verifiable one) in a natural parametric series of conditions on 
a contrast matrix H; every one of these conditions validates the regular and the penalized l\ recoveries 
associated with H in certain restricted range of values of p in the error bounds (29). 

5.1 Conditions H s g («;) 

Let us fix an m x n sensing matrix A. Given a positive integer s < m, a q € [1, oo] and a real k, > 0, let us 
say that an m x n contrast matrix H satisfies condition H s>(J (k), if 

Vx G R n : \\x\\ SA < W^AxWoo + ks^" 1 ||x||i, (34) 

where = H^Hg and x s , as always, is the vector obtained from x by zeroing all but the s largest in 

magnitude entries. Observe that 

• What used to be denoted H SiOC (k) before, is exactly what is called H Si00 (k) now; 

• If H satisfies H Stq (K), H satisfies H S (3 /(«;) for all q' £ [l,g] (since for s-sparse vector x s we have 

j__ i 

\\x S \\q' < Si' 1 \\x S \\ s , q ). 

Less immediate observations are as follows: 

• Let A be an m x n matrix and let s < n be a positive integer. We say that A is s-good if for all 
s-sparse x G R n the ^-recovery 

x G Argmin{ ||v|| i : Av = y} 

V 

is exact in the case of noiseless observation y = Ax. It turns out that feasibility of H Si i(k) with k < ^ 
is intimately related to s-goodness of A: 

Lemma 4 A is s-good if and only if there exist k < \ and H G R mxn satisfying H S;1 (fv). 

• The Restricted Isometry Property implies feasibility of YI s ^{k) with small k: 

Lemma 5 Let A satisfy RIP(5, 2s) with 6 < ^. Then the matrix H = j^A satisfies the condition 
H Sj 2(k) with k = < I- 
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5.2 Regular and penalized l\ recoveries with contrast matrices satisfying H S)(? (k) 

Our immediate goal is to obtain the following extension of the main results of Section 2, specifically, Propo- 
sitions 2, 4: 

Proposition 7 Assume we are given anmxn sensing matrix A = [a±, a n ], an integer s < m, k < 1/2, a 
contrast matrix H = [hi, h n ] G R mxn , and q G [l,oo] such that H satisfies the condition H S)? (k). Denote 
v i = v e,tjjbl{hi) , where the norm v e ,aju(') is defined in (4), and v{H) = maxji/j. Let also noise intensity a, 
uncertainty setli and tolerance e G (0, 1) be given. 

(i) Consider the regular recovery (7) with the contrast matrix H and the parameters pi satisfying the 
relations 

Pi > V{, 1 < i < n, 

and let p = max pi . Then 

i 

l<P<q^ msk p (x vcg {-)\e,a,s,v) < (3s) ^ + ^f}^ - ■ (35) 

(ii) Consider the penalized recovery (17) with the contrast matrix H and 8 = 2. Then 

. . . . , i2v(H) + s~ 1 v , , 

1 < p < q => Riskp(x pcn (-)|e, o-,s,v) < 3sp . (36) 

J. Zj t\l 

5.3 Error bounds for Lasso and Dantzig Selector under condition H S (? (k) 

We are about to demonstrate that the feasibility of condition H Sj(J (k) with k < h implies some consequences 
for the performance of Lasso and Dantzig Selector when recovering s-sparse signals in || • \\ p norms, 1 < p < q. 
This might look strange at the first glance, since neither Lasso nor Dantzig Selector use contrast matrices. 
The surprise, however, is eliminated by the following observation: 

(!) Let H satisfy H S)? (k) and let X be the maximum of the Euclidean norms of columns in H. 
Then 

Vx G R™ : < Xsi \\Ax\\2 + ksi \\x\\i- (37) 

The fact that a condition like (37) with k < 1/2 plays a crucial role in the performance analysis of Lasso 
and Dantzig Selector is neither surprising nor too novel. For example, the standard error bounds for the 
latter algorithms under the RIP assumption are in fact based on the validity of (37) with A = O(l) for 
q = 2 (see Lemma 5). Another example is given by the Restricted Eigenvalue [2] and the Compatibility 
conditions [3, 19]. Specifically, the Restricted Eigenvalue condition RE(s,p, x) (s is positive integer, p > 1, 
x > states that 

lla^lh < — ll^^lb whenever > — x s ||2, 

whence < ^||Ax||2 whenever (p + l)||x s ||i > so that 

s 1 / 2 1 

Vx G R n : llxIL i < \\Ax\\ 2 + |M|i;. (38) 

x 1 + p 

Further, the Compatibility condition of [19] is nothing but (38) with p = 3. We see that both Restricted 
Eigenvalue and Compatibility conditions imply (37) with q = 1, A = (x-y/s) -1 and certain k < 1/2. 

We are about to present a simple result on the performance of Lasso and Dantzig Selector algorithms in 
the case when A satisfies the condition (37). The result is as follows: 



13 



Proposition 8 Let m x n matrix A = [ai,...,a n ] satisfy (37) with k < | and some q E [l,oo], and let 
/3 = max||ai||2- Let also the "environment parameters" a > 0, e 6 (0,1) 6e given, and let there be no 

i 

nuisance: U = {0}. 

(i) Consider the Dantzig Selector recovery 



%T>s(y) £ Argmin { ||u||i : \\A (Av 



where 
Then 



P>Q'= o-pyj2\n(n/e). 



1 < p < q => Riskp(xT>s(-)\e,o-,s,v) < 



2(3s)p 



1 - 2k 

(ii) Consider the Lasso recovery 

^lasso(y) G ArgminlUvlli + - y|||} 

u 

and let x satisfy the relation 

2k + 2qk < 1, 

where g is given by (39). Then 

l<P<q^ Risk p (£ lasso (-)|e,cr,s,i;) < 

In particular, with 



2sX>( P + e ) _ ly 
1 - 2k 



(39) 
(40) 



4s p 



1 — 2k — 2qx 



2sA 2 



+ s~ L v 



X 



I -2k 



one has 



l<p<q=> Risk p (xi asso (-)|e,CT, s,v) < 



8s p 
1 - 2k 



8sg\ 2 i 
+ s~ L v 



I -2k 



(41) 

(42) 
(43) 



Discussion. Let us compare the error bounds given by Propositions 7, 8. Assume that there is no 
nuisance (U = {0}) and A is such that the condition iJ Sj? (i) is satisfied by certain matrix H, the maximum 
of Euclidean norms of the columns of H being A. Assuming that the penalized recovery uses 9 = 2, and 
the regular recovery uses p = v(H) = \o\j2 ln(n/e)), the associated risk bounds as given by Proposition 7 
become 



Risk p (x(-)\e,a,s,v) < 0(1)sp Acr v / 21n(n/e) + s 1 



1 < p < q. 



(44) 



Note that these bounds admit a transparent interpretation: in the range 1 < p < q an s-sparse signal is 
recovered as if we were identifying correctly its support and estimating the entries with the uniform error 
0{l)Xa^2\n(n/e). 

Now, as we have already explained, the existence of a matrix H satisfying H Sjq (j) with columns in H 
being of Euclidean lengths < A implies validity of (37) with k = \. Assuming that in Dantzig Selector one 
uses p = g, and that x in Lasso is chosen according to (42), the error bounds for Dantzig Selector and Lasso 
as given by Proposition 8 become 



Risk p (x(-)\e,a,s,v) < 0(l) S p [p\]sXay / 2\n(n/e) + s^v 



1 < p < q. 



(45) 



Observe that /3A > 0(1) (look what happens with (37) when x is the i-th basic orth). We see that the bounds 
(45) are worse than the bounds (44), primarily due to the presence of the factor s in the first bracketed 
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term in (45). At this point it is unclear whether this drawback is an artifact caused by poor analysis of 
the Dantzig Selector and Lasso algorithms or it indeed "reflects reality." Some related numerical results 
presented in Section 6.1 suggest that the latter option could be the actual one. 

Moreover, consider an example of the recovery problem with a 2 x 2 matrix A with unit columns and 
singular values 1 and e. It can be easily seen that if x is aligned with the second right singular vector of A 
(corresponding to the singular value e) the error of the Dantzig Selector may be as large as 0(e~ 2 cr), while 
the error of ".ff-conscious" recovery will be O(e~ l o) up to the logarithmic factor in e (indeed, choosing 
H = A" 1 results in A = This toy example suggests that the extra A factor in the bound (45), at least 

for Dantzig Selector, is not only due to our clumsy analysis. 

This being said, it should be stressed that the comparison of regularized/penalised l\ recoveries with 
Dantzig Selector and Lasso based solely on above the error bounds is somehow biased against Dantzig 
Selector and Lasso. Indeed, in order for regular/penalized l\ recoveries to enjoy their "good" error bounds, 
we should specify the required contrast matrix, which is not the case for Lasso and Dantzig Selector: the 
bounds (45) require only existence of such a matrix 5 . Besides this, there is at least one case where error 
bounds for Dantzig Selector are as good as (44), specifically, the case when A possesses, say, RIP(0.1,2s). 
Indeed, in this case, by Lemma 5, the matrix H = 0(1)A satisfies H S: 2{\), meaning that Dantzig Selector 
with properly chosen p is nothing but the regular recovery with contrast matrix H and as such obeys the 
bounds (44) with q = 2. 

It is time to point out that the above discussion is somehow scholastic: when q < oo and s is nontrivial, 
we do not know how to verify efficiently the fact that the condition H Si9 (k) is satisfied by a given H, not 
speaking about efficient synthesis of H satisfying this condition. One should not think that these tractability 
issues concern only our algorithms which need a good contrast matrix. In fact, all conditions which allow 
to validate Dantzig Selector and Lasso beyond the scope of the "fully tractable" condition H SjQC (k) are, 
to the best of our knowledge, unverifiable - they cannot be checked efficiently, and thus we never can be 
sure that Lasso and Dantzig Selector (or any other known computationally efficient technique for sparse 
recovery) indeed work well for a given sensing matrix. As we have seen in Section 3, the situation improves 
dramatically when passing from unverifiable conditions H Sj(? (k), q < oo, to the efficiently verifiable condition 

oo(k), although in a severely restricted range of values of s. 

6 Numerical examples 

We present here a small simulation study. 

6.1 Regular/penalized recovery vs. Lasso: no-nuisance case 

To illustrate the discussion in Section 5.3, we compare numerical performance of Lasso and penalized recovery 
in the observation model (1) without nuisance: 

y = Ax + a£, £~JV(0,I m ), 

where a > is known. The sensing matrix A is specified by selecting at random m = 120 rows of the 
128 x 128 Hadamard matrix 6 , and "suppressing" the first of the selected rows by multiplying it by l.e-3. 
The resulting 120 x 128 sensing matrix has orthogonal rows; 119 of its 120 singular values are equal to 8\/2, 
and the remaining sing ular value is 0.008^/2- 

We have processed A as explained in Section 3 (a reader is referred to this section for the description of 

And even less than that, since feasibility of H a , q (K,) is just a sufficient condition for the validity of (37), the condition which 
indeed underlies Proposition 8. 

6 The fc-th Hadamard matrix H k is given by the recurrence H° = 1, H p+1 = [H p ; H p ; H p , -H p ]. It is a 2 k x 2 fc matrix with 
orthogonal rows and all entries equal to ±1. 
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\\x — x\\ p 


Recovery 


a 


X 


p = 1 


p = 2 


p = oo 


Penalized 




N/A 


2.1e-4 


6.5e-5 


3.8e-5 


Lasso 


l.e-4 


3.74e-3* 


2.1e-4 


5.2e-4 


3.9e-5 


Lasso 




4.01e-2 T 


1.6e-3 


5.2e-4 


2.0e-4 


Penalized 




N/A 


2.2e-5 


6.0e-6 


2.7e-6 


Lasso 


l.e-5 


4.78e-4* 


3.1e-5 


8.1e-6 


3.4e-6 


Lasso 




4.01e-3 t 


1.8e-4 


5.8e-5 


2.1e-5 


Penalized 




N/A 


2.1e-6 


6.2e-7 


2.5e-7 


Lasso 


l.e-6 


1.10e-4* 


8.8e-6 


1.6e-6 


5.9e-7 


Lasso 




4.01e-4 T 


1.8e-5 


5.4e-6 


1.9e-6 



Table 1: . Lasso vs. penalized l\ recovery. Choice of x: * - "ideal" choice; t - theoretical choice. 



entities involved). 7 We started with computing 7*, which turned out to be 0.0287, meaning that the level 
of s-goodness of A is at least 17. In our experiment, we aimed at recovering signals with at most s = 10 
nonzero entries and with no nuisance {IA = {0}). The synthesis of the corresponding "optimal" contrast 
matrix H = as outlined in Section 3 results in 7 = 0.294, w*(7) = 0.0899-^/2 ln(n/e). Note that we are 
in the case of IA = {0}, and in this case the optimal H is independent of the values of a and e. 

We compare the penalized ^i-recovery with the contrast matrix H* and 9 = 2 with the Lasso recovery on 
randomly generated signals x with 10 nonzero entries. We consider two choices of the penalty x in Lasso: 
the "theoretically optimal" choice (42) and the "ideal" choice, where we scanned the "fine grid" (1.05) fc , 
k = 0, ±1, ±2, ... of values of x and selected the value for which the Lasso recovery was at the smallest 
|| • ||i-distance from the true signal. The confidence parameter e in (42) was set to 0.01. 

The results of a typical experiment are presented in Table 1. We see that as compared to the penalized 
l\ recovery, the accuracy of Lasso with the theoretically optimal choice of the penalty is nearly 10 times 
worse. With the "ideal" (unrealistic!) choice of penalty, Lasso is never better than the penalized l\ recovery, 
and for the smallest value of a is nearly 4 times worse than the latter routine. 

6.2 The nuisance case 

In the second experiment we study the behavior of recovery procedures in the situation when an "input 
nuisance" is present: 

y = A(x + v) + <t£, 

where x € R n is an unknown sparse signal, v £ V with known V C R n , a is known and £ € R m is standard 
normal £ ~ N(0,I m ); in terms of (3), u = Av and IA = AV. We compare the performance of the regular 
and penalized recoveries to that of the Lasso and Dantzig Selector algorithms. To handle the nuisance, the 
latter methods were modified as follows: instead of the standard Lasso estimator we use the estimator 

^iasso(y) e Argminmin{||x||i + x\\A(x + v) - y\\\} , 

where the penalization coefficient x is chosen according to [2, Theorem 4.1]; in turn, the Dantzig Selector 
is substituted by 

abs(y) £ Argminmin{||x||i : | [A T (A(x + v) - y)]i\ < Qi, i = l,...,m\ 
xgR m vev 

7 It is worth to mention that when A is comprised of (perhaps, scaled) rows of an Hadamard matrix (and in fact, of scaled 
rows of any other Fourier transform matrix associated with a finite Abelian group) the synthesis described in Section 3 simplifies 
dramatically due to the fact that all problems (P t 7 ) turn out to be equivalent to each other, and their optimal solutions are 
obtained from each other by simple linear transformations. As a result, we can work with a single problem (P?) instead of 
working with n of them. 
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with Qi = cr-v/21n(n/e)||j4j||2, where are the columns of A and e is given (in what follows e = 0.01). 
We present below the simulation results for two setups with n = 256: 

1. Gaussian setup: a 161 x 256 sensing matrix ^Gauss with independent N(0, 1) entries is generated, then 
its columns are normalized. The nuisance set V = V(L) C R 256 is as follows: 

V(L) = {v e R 256 , \v i+1 - 2vi + < L, for t = 2, 255, u 2 = v\ = 0}, 

where L is a known parameter; in other words, we observe the sum of a sparse signal and "smooth 
background." 

2. Convolution setup: a 240 x 256 sensing matrix ^4 conv is constructed as follows: consider a signal x 
"living" on Z 2 and supported on the 16 x 16 grid T = € Z 2 : < i,j < 15}. We subject such 
a signal to discrete time convolution with a kernel supported on the set {(i,j) G Z 2 : —7 < i,j < 7}, 
and then restrict the result on the 16 x 15 grid r + = £ T : 1 < j < 15}. This way we obtain 
a linear mapping x i— y A com x : M 256 — > M 240 . The nuisance set V = V(L) C R 256 is composed of 
zero-mean signals uonT which satisfy 

\[D 2 u]i,j\ <L, 

where D is the discrete (periodic) homogeneous Laplace operator: 

[Du]i d = - (u iJz ^ + Uj=Tj + tt-j+j + u mj - 4tiij) , i, j = 1, 16, 

with i = i mod 16, j = j mod 16. 

In the simulations we acted as follows: given the sensing matrix A, the nuisance set IA = AV and the values 
of s and a, we compute the contrast matrix H by choosing a "reasonable" value 7 > 7* of 7 and specifying 
H as the matrix satisfying H Sj00 (s7) and such that u{H) = cj*(t), see Section 3. Then N samples of 
random signal x, random nuisance v £ V and random perturbation £ were generated, and the corresponding 
observations were processed by every one of the algorithms we are comparing 8 . The plots below present the 
average, over these N = 100 experiments, £oo and i\ recovery errors. All recovery procedures were using 
Mosek optimization software [1]. 

We start with Gaussian setup in which the signal x has s = 2 non-vanishing components, randomly 
drawn, with ||x||i = 10. For the penalized and the regular recovery algorithms the contrast matrix H was 
computed using 7 = 0.1. On Figure 1 we plot the average recovery error as a function of the value of the 
parameter L of the nuisance set V, for fixed a = 0.1, and on Figure 2 — as a function of a for fixed L = 0.01. 

In the next experiment we fix the "environmental parameters" a, L and vary the number s of nonzero 
entries in the signal x (of norm ||x||i = 5s). On Figure 3 we present the recovery error as a function of s. 

We run the same simulations in the convolution setup. The contrast matrix H for the penalized and the 
regular recoveries is computed using 7 = 0.2. On Figure 4 we plot the average recovery error as a function 
of the "size" L of the nuisance set V for fixed a = 0.1, on Figure 5 — as a function of a for fixed L = 0.01, 
and on Figure 6 — as a function of s. 

We observe quite different behavior of the recovery procedures in our two setups. In the Gaussian setup 
the nuisance signal v G V does not mask the true signal x, and the performance of the Lasso and Dantzig 
Selector is quite good in this case. The situation changes dramatically in the convolution setup, where the 
performance of the Lasso and Dantzig Selector degrades rapidly when the parameter L of the nuisance set 
increases. 9 The conclusion suggested by the outlined numerical results is that the penalized l\ recovery, 
while sometimes losing slightly to Lasso, in some of the experiments outperforms significantly all other 
algorithms we are comparing. 

8 Randomness of the sparse signal x is important. Using the techniques of [14], one can verify that in the convolution setup 
there are signals with only 3 non-vanishing components which cannot be recovered by £1 minimization even in the noiseless case 
V = {0}, (7 = 0. In other words, the s-goodness characteristic of the corresponding matrix A is equal to 2. 

9 The error plot for these estimators on Figure 4 flatters for higher values of L simply because they always underestimate the 
signal, and the error of recovery is always less than the corresponding norm of the signal. 
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0.05 0.1 0.15 0.2 0.25 

£oo-error ^i-error 

Figure 1: Mean recovery error as a function of the nuisance magnitude L. Gaussian setup parameters: 
a = 0.1, 8 = 2, \i = 0.1, ||x||i = 10. 




0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 



£oo-error ^i-error 

Figure 2: Mean recovery error as a function of the noise StD a. Gaussian setup parameters: L = 0.01, 
s = 2, fj, = 0.1, ||x||i = 10. 

7 Non-Euclidean matching pursuit algorithm 

The Matching Pursuit algorithm for sparse recovery is motivated by the desire to provide a reduced com- 
plexity alternative to the algorithms using ^-minimization. Several implementations of Matching Pursuit 
has been proposed in the Compressive Sensing literature (see, e.g., [11, 10, 12]). They are based on succes- 
sive Euclidean projections of the signal and the corresponding performance results rely upon the bounds on 
mutual incoherence parameter n(A) of the sensing matrix. We are about to show how the construction of 
Section 3 can be used to design a specific version of the Matching Pursuit algorithm which we refer to as 
Non-Euclidean Matching Pursuit (NEMP) algorithm. The NEMP algorithm can be an interesting option if 
the ^-recovery is to be used repeatedly on the observations obtained with the same sensing matrix A; the 
numerical complexity of the pursuit algorithm for a given matrix A may only be a fraction of that of the 
recovery, especially when used on high-dimensional data. 

Suppose that we have in our disposal 7 > such that the condition H(7[l; 1]) is feasible; invoking 
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Figure 3: Mean recovery error as a function of the number s of nonzero entries in the signal. Gaussian setup 
parameters: L = 0.01, a = 0.1, 7 = 0.1, llxlli = 5s. 
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Figure 4: Mean recovery error as a function of the nuisance magnitude L. Convolution setup parameters: 
a = 0.1, s = 2, 7 = 0.2, \\x\\i = 10. 



Lemma 3, in this case we can find efficiently a contrast matrix H = [hi, h n ] such that 

\[I-H T A) i:i \ < 7) u(H) =w,(7), 



(46) 



where, as always, v(H) = max^(/ij) with v{h) = Ve,a,u{h) given by (4). 

i 

Consider a signal x £ R™ such that \\x — x s \\\ <v, where, as usual, x s is the vector obtained from x by 
replacing all but the s largest in magnitude entries in x with zeros, and let y be an observation as in (3). 
Suppose that sj < 1, and let v > be given. Consider the following iterative procedure: 



Algorithm 1 

1. Initialization: Set = 0, 



\H T y\\ s ^ + sv{H) + v 
1 — S7 



2. Step k, k = 1,2,...: Given v^ k ^ € R n and ctfc_i > 0, compute 
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Figure 5: Mean recovery error as a function of the noise StD a. Convolution setup parameters: L = 0.01, 
5 = 2, 7 = 0.2, = 10. 
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Figure 6: Mean recovery error as a function of the number 5 of nonzero entries in the signal. Convolution 
setup parameters: L = 0.01, a = 0.1, 7 = 0.2, ||x||i = 5s. 



(a) u = H T (y — Ay( k and vector A G R n with the entries 

Ai = sign(^)[|^| - 7a fc _i - v{H)] + , 

(here [a] + = max[0, a]^. 

(b) Setv<® =v&-V+A and 

ajc = 2s^/ak-i + 2sv(H) + v. 

and loop to step k + 1. 
3. The approximate solution found after k iterations is v^ k \ 



1 < i < n 



(47) 



Proposition 9 Assume that sj < 1 and an v > is given. Then there exists a set 5 C R m , Prob{£ 6 
H} > 1 — e, of "good" realizations of £ such that whenever £ € S, for every x G R n satisfying \\x — x s \\i < v 
and every u €.U, the approximate solution and the value after the k-th step of Algorithm 1 satisfy 

for alii G Conv{0;xi} 



\x — v 



(*)| 



< afc and \\x — v 



(*+i)| 
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Figure 7: A typical signal/worst Lasso nuisance. Gaussian setup with parameters: L = 0.05, a = 0.1, s = 2, 
||x||i = 10, 7 = 0.1. 



Note that if 2s7 < 1 then also 57 < 1 and Proposition 9 holds true. Furthermore, by (47) the sequence 

to t 

-x\\ 1 <a k = (2sj) k [a - o^] + a c 



ak converges exponentially fast to the limit aoc '■= ^j~2^^ : 



Along with the second inequality of (bk) this implies the bounds: 



\v {k) ~ Xlloo 



< 27a fc _i + 2w*(7) < 



a k . 



and, since ||x|| p < ||^||oo p for 1 < p < 00, 



- x\ 



< s p" ((2s7) fc [a - «oo] + Ooo^) . 



The bottom line here is as follows: 



Corollary 2 Letj < l/(2s) be such that the condition H(7[l; 1]) is feasible, so that we can find efficiently 
a contrast matrix H satisfying (46). With Algorithm 1 associated with H and some v > 0, one ensures that 
for every t = 1,2, the approximate solution jjW found after t iterations satisfies 



msk p (v(%,a,s,v) < si ( 2u f]+ s ; lv + (2*7)' 



cj«(7)+s- 1 (||g T y|U,i+u) _ 2uj, (y)+s- 1 v 
1— S7 1— 2«7 



1 < p < OO. 



(cf. (29)). 

To put this result in proper perspective, note that the mutual incoherence based condition 



< 



1 



1 + H(A) 2s 

underlying typical convergence results for the Matching Pursuit algorithms as applied to recovery of s-sparse 
signals (see, e.g. [11, 10, 12]) definitely is sufficient for convergence of the NEMP algorithm with 7 = j^ffj\r[, 
see Section 4.1. It follows that the scope of NEMP is at least as wide as that of "theoretically valid" Matching 
Pursuit algorithms known from the literature; in the situation in question Corollary 2 recovers some results 
from [10, 11, 12]. 
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Figure 8: A typical signal/recovery in Convolution setup. Parameters: L = 0.025, a = 0.1, s = 2, ||x||i = 10, 
7 = 0.2. 

References 

[1] Andersen, E. D., Andersen, K. D. The MOSEK optimization tools manual. Version 5.0 
http : //www.mosek. com/f ileadmin/products/5_0/tools/doc/html/tools/index.html 

[2] Bickel, P., Ritov, Y. and Tsybakov, A. B. Simultaneous analysis of Lasso and Dantzig selector. Ann. 
Statist, 37, 1705-1732 (2009). 

[3] Buhlmann, P., van de Geer, S. On the conditions used to prove oracle results for the Lasso Electron. J. 
Statist, 3, 1360-1392 (2009). 

[4] Bunea, F., Tsybakov, A.B. and Wegkamp, M.H. Sparsity oracle inequalities for the Lasso. Electron. J. 
Stat, 1, 169-194 (2007). 

[5] Candes, E., Romberg, J., Tao T. Robust uncertainty principles: exact signal reconstruction from highly 
incomplete frequency information. IEEE Trans. Inform. Theory, 52 489-509 (2006). 

[6] Candes, E., Romberg, J., Tao T. Stable signal recovery from incomplete and inaccurate measurements. 
Comm. Pure Appl. Math., 59 1207-1223 (2006). 



22 



[7] Candes, E. and Tao, T. The Dantzig selector: statistical estimation when p is much larger than n. Ann. 
Statist., 35, 2313-2351 (2007). 

[8] Candes, E. J. The restricted isometry property and its implications for compressed sensing. Comptes 
Rendus de I'Acad. des Sci., Serie I, 346, 589592 (2008). 

[9] Donoho, D., Statistical estimation and optimal recovery. The Annals of Statistics 22, 1, 238-270 (1995). 

[10] Donoho, D., Elad, M. Optimally sparse representation in general (non-orthogonal) dictionaries via t\ 
minimization. Proc. Natl. Acad. Sci. USA, 100, 2197-2202 (2003). 

[11] Elad, E., Bruckstein, A.M. A generalized uncertainty principle and sparse representation in pairs of R n 
bases. IEEE Trans. Inform. Theory, 48, 2558-2567 (2002). 

[12] Gribonval, R., Nielsen, R. Sparse representations in unions of bases. IEEE Trans. Inform. Theory, 49, 
3320-3325 (2003). 

[13] Juditsky, A. B., Nemirovski A.S. Nonparametric estimation by convex programming. Ann. Statist., 37, 
5a, 2278-2300 (2009). 

[14] Juditsky, A., Nemirovski, A. On verifiable sufficient conditions for sparse signal recovery via l\ mini- 
mization. Mathematical Programming, 127:1, 57-88 (2011). 

[15] Juditsky, A., Kiling Karzan, F., Nemirovski, A. Verifiable conditions of ^i-recovery of sparse signals 
with sign restrictions. Mathematical Programming, 127:1, 89-122 (2011). 

[16] Koltchinskii, V. The Dantzig selector and sparsity oracle inequalities. Bernoulli, 15 799-828 (2009). 

[17] Lounici, K. Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estima- 
tors. Electronic Journal of Statistics, 2, 90-102 (2008). 

[18] Meinshausen, N. and Yu, B. Lasso-type recovery of sparse representations for high-dimensional data. 
Annals of Statistics, 37, 246-270 (2009). 

[19] van de Geer, S. The deterministic Lasso. In JSM proceedings, American Statistical Association (2007) 
(see also http:/ /stat.ethz.ch/research/research_reports/2007/140). 

[20] Zhang, C.-H., Huang, J. The sparsity and bias of the Lasso selection in highdimensional linear regression. 
Ann. Statist, 36, 1567-1594 (2008). 

[21] Zhang, T. Some sharp performance bounds for least squares regression with L\ regularization. Ann. 
Statist., 37, 2109-2144 (2009). 

A Proofs 

A.l Proofs for section 2 
A. 1.1 Proof or Lemma 1 

The first claim is evident. Now let H = [hi,...,h n ] satisfy H Si00 (k), and let /i n +j = —hi, 1 < % < n. Then 
for every i < n and every x S R n with < 1 we have 




s s 



iFAxWoo = - + 



AeR.2 



max 




i=l 



Xihi Ax > 
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or, which is the same, 



min max 

By von Neumann lemma, this is the same as 

max min 

AeR2":J] i Ai=la;:|H|i<l 

and the outer max clearly is achieved, meaning that there exists A* > 0, X^=i = 1> sucn that with 
K = Sj=i Xjhj one nas 

^ + [KfAx - xi > Vx : < 1, 

so that for every x with \\x\\\ < 1 one has ^ + |[/i'J t Ae| — X{ > 0; applying the latter inequality to — x in 
the role of x, we get ^ + |[^] T ^4x| > \xi\ whenever ||sc||i < 1, whence, of course, f ||x||i + |[/i^] T ^4x| > \x{\ 
for all x. We conclude that the matrix H' = \h' 1 , satisfies H(j[l; ...; 1]). It remains to note that by 

construction the columns of H' are convex combinations of the columns of H and —H, and that building 
H' reduces to solving n matrix games and thus can be carried out efficiently. ■ 



s \_2-^ i= i 



Xi hi 



/I ry ry . 



> 0. 



S + \A-^i=l 



Xi hi 



A.x Xi 



>0, 



A. 1.2 Proof of Proposition 1 

Let 



3 = {£ : \hfy\ < y/2hx(n/e)\\hi\\2 1 < i < n}, 

so that Prob{£ G H} > 1 — e. Let us fix £ G E, a set I = {1, ...,n}\J C {1, ...,n} satisfying (9), a signal 
x G R n and a realization u £U of the nuisance, and let x be the value of the estimate (7) at the observation 
y = Ax + u + cr£. We are about to verify that x satisfies (10), which, of course, will complete the proof. 
Observe that because of £ G E we have 



\hf(Ax 



\hf(u + cr£)| < max |/if u'| + cr-\/21n(n/e)||/ij||2 = = z^, 1 < i < 



Now, pi > z/j by (8), whence \hf(y — Ax)\ < pi for all i, and thus x is a feasible solution to the optimization 
problem in (7) and thus ||x||i > ||x||i. Setting z = x — x, we now have ||scj||i = ||S||i — ||x/||i < ||x||i — < 
INK ~~ ll x /lli + = + whence \\zj\U < ||xj||i + \\x < 2||xj|h + It follows that 



kill < 2||zj||i + 2||xj||. 



(48) 



Further, \hfA(x — x)\ < \hf(Ax — y)\ + \hf(Ax — y)\. Since x is feasible for the optimization problem in 
(7), we have \hJ(Ax — y)\ < Pi, and we have already seen that \hJ(Ax — y)\ < Vi, hence 



\hjAz\ < pi + Vi 



(49) 



for all 1 < i < n. Applying (5) we now get 



= ^2\zi\<J2[\hjAz\+-f i \\z\\ 1 ]<^2(p i + u i ) + 
iei iei i&i 



£7. 

.iei . 



\zi\\i + \\ZJ 1 



= pi + vi + 11 Dklli + IMM < pi + vi + 277 [|| js-j- II 1 + ||»j||i] , 

where the concluding < is given by (48). Taking into account that 7/ < |, we get 

Pi + vi + 27/||xj||i 



\ z i 1 < 



I-277 
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Invoking (48) once again, we finally get 

^ || n ^ PI + + 27/||xj||i .. .. 

WAV = \\zi i + \\ Z J i < 2 l + 2 FJ i < 2 ; n 1- 2 Fj i, 

1 - 277 

and we arrive at (10. a). 

To prove (10.6), we apply (5) to z = x — x, thus getting 

\zi\ < \hfAz\ + 7ilklli- 

As we have already seen, \hfAz\ < Pi + Vi, and the first "<" in (10.6) follows; the second "<" in (10.6) 
is then readily given by (10. a). Now (ii) and (iii) are immediate consequences of (10) and the fact that 
7s < s 7- I 



A. 1.3 Proof of Lemma 2 

In what follows, we use the notation from Proposition 1. For x G X(s,v), denoting by I the support of x s , 
we have 

<v,pi<p s < sp, vi<v s < sv{H), 7/ < v s < sj. 

Assuming 7 S < |, for £ G H (which happens with probability > 1 — e), (12) implies that for all u G U it 
holds 

2 v + p s + v s 

Freg(y) ~ x\\ < , _ ^ [V + p s + Vs], and prog (j/) -3?||oo < p + v{H) + 2^ _ , 

1 Z7 S 1 z'jg 

y v ' * „ ' 

P Q 

1 p—1 

which combines with the standard bound \\z\\ p < \\z\\* Halloo to imply (13). When 57 < |, we clearly have 



Y=2t\ v + S (P + "(#))], Q^P + »{H) + y^_[v + s(p + v{H))] = y=2^\ 



and (14) follows due to ||x reg (y) — x\\ p = PpQ p . 



A. 1.4 Proof of Proposition 2 

The proof is obtained by minor modifications from the one of Proposition 1. Same as in the latter proof, let 
S = {£ : \hjy\ < y / 21n(n/e)||/ij||2 1 < i < n}, where hi are the columns of H, so that Prob{£ G H} > 1 — e. 
Let us fix £ G H, x G R n , u G U, let r\ = at; + u, y = Ax + r), x = x TCg (y), z = x — x. Finally, let I be the 
support of x s . 

Due to £ G E, we have 

\hJ(Ax — y)\ = \h[(u + <r£)| < max |/i?V| + uy / 21n(n/e)||/ii||2 = = Mj, 1 < 2 < 

whence, by (8), x is a feasible solution to the optimization problem in (7) and thus ||x||i > The latter, 

exactly as in the proof of Proposition (1), implies the validity of (48): 

||z||i <2||zi||1 + 2||xj||i. (50) 
Besides this, the same reasoning as in the proof of Proposition 1 results in (49), whence 

||H T Az||oo < p + v{H). (51) 
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Applying (6) to z, we get 

Ik/Hi < s||z||oo < s||-ff T A2||tx3 + ^Plli < s(p + v(H)) + k||z||i, 
which combines with (50) to imply that 

Nil < t-^tt [2s(p + u(H)) + 2||xj||i] , (52) 
1 — 2k 

which is nothing but the first relation in (15). Applying to z (6) once again, we get 

Halloo < \\H AzW^ + s «IfIIi> 

which combines with (52) to imply the second relation in (15). Relation (15) combines with the Moment 
inequality to imply (16). ■ 

A. 1.5 Proof of Proposition 3 

(i): Given a, e, let, same as in the proof of Proposition 1, S = {£ : \hf£\ < -y/21n(n/e)||/ii||2, 1 < i < n}, 
so that Prob{£ £ H} > 1 — e. Let us fix £ € 3, u € IA and a signal x G R n , and let us prove that for 
these data (21) takes place; this clearly will prove (i). Let us set y = Ax + <r£ + u, x = x pcn (y), z = x — x, 
rj = u + <t£. Let also I be the support of x s . 
Observe that by the origin of x, we have 

+ s9\\H T {Ax - y)||oo < ||a;||i + s6\\H T (Ax - y)^ = ||x||i + 8 0\\H T 'n\\ OO) (53) 

and 

\\H T (Ax - y)^ = \\H T (Az + Ax - y)^ > \\H T Az]^ - \\H T (Ax - y)^ = \\H T Az\\oo - \\H T r l \\ 00 . 
Combining the resulting inequality with (53), we get 

p||i + 80||£T r As|| oo < p||i + 2s8\\H T ri\\ O0 < p||i + 2s0v(H), (54) 
where the concluding < is due to £ G 3 combined with (20). Further, 

p||i = p + z\\i = p/ + 2/ ||i + \\xj + Zj||i > p/||l - p/||i + pj||i - pj||i, 

which combines with (54) to imply that 

p/||i - p/||i + Pj||i - Pj||i + s^||-H" T ^4^||oo < p||i + 2s6v(H), 
or, which is the same, 

PjIIi - P/lli + sOW^AzWoo < 2pj||i + 2s6v{H). (55) 

By (5), we have 

Vi : \zi\ < \\H T Az\\oo + 7ilkllij (56) 
whence p/||i < s||iif T '^4^|| 00 + 7 s p||i and therefore 

(1 -7 s )p/||i -7spj||i - s||-ff T ^2;||oo < 0. 
Multiplying the latter inequality by 9 and summing up with (55), we get 

[0(1 - 7 S ) - l]p/||i + (1 - < 2Pj||i + 2s0v(H). 
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In view of condition (19), the coefficients in the left hand side are positive, and (21. a) follows. 
To prove (21.6), note that from (54) it follows that 

||ff T A8||oo < ^[|M|i - p||i] + 2u(H) < + 2v(H), 

which combines with (56) to imply that 

IMloo < -^p||i + 2v(H) + 7||z||i. 
so 

Recalling that z = x — x and invoking (21. a), (21.6) follows. 

(ii)— (iii): (22) is an immediate consequence of (21) due to 7, < sj. Assuming that x G X(s,v) and 9 
and taking into account that 7 S < sj, we obtain from (22) that uniformly on £ G E and u G U 



\xr eg {y) ~ x\\i < 



2v + Asv{H) 
1 - 2s7 



> preg(?/) 2-||oo 5; 



(s' 1 + 2t> + Av{H) 



1 - 2s7 



p 



Using, as in the proof of Lemma 2, the standard bound 



i p-i 



Z L) ^ \ \ Z i -2- 



1 p-1 



< PpQ v 



we come to (23). 



A. 1.6 Proof of Proposition 4 

The proof is obtained by minor modifications from the one of Proposition 1. Same as in the latter proof, 
let S = {i : \hf£\ < ^21n(n/e)||/ti|| 2 , 1 < i < n}, so that Prob{£ G E} > 1 - e. 

Let us fix £ G E, u G U and a signal x G R n . Let us set y = Ax + cr£ + u, x = x pen (y), z = x — x, 
r) = u + <t£. Let also I be the support of x s . 

Observe that by the origin of x and due to 9 = 2 we have 



\x\\ 1 + se\\H T (Ax 



< \\x\\i + s9\\H T (Ax 



and 



|oo — \\H A^ljoo 



H T (Ax 



x\\i + 2s\\H T i]\\ OQ , 
)IU = \\H T Az\U - 



(57) 



\\H T {Ax - y)^ = \\H T {Az + Ax- 
Combining the resulting inequality with (57), we get 

p||i + 2s||# T A2i|| 00 < p||i + 4s||i/ T r/|| 00 < p||i + Asv(H), 
where the concluding < is due to £ G 3 combined with the definition of v{H). Further, 

p||i = p + z||i = pj + zi\\i + \\xj + zj\\i > p/Hi - p/||i + pj||i - pj||i 
which combines with (58) to imply that 

p/||i - p/||l + Pj||i - pj||i + 2s||i? T Az|| 00 < p||i + 4sv(H), 
or, which is the same, 

PjIIi - p/||i + 2s\\H T Az\\ 00 < 2\\xj\\i +Asu(H). 



\H T v\\oo- 



(58) 



(59) 
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By (6) we have 

\\z\\ 00 <\\H T Az\\ oc + -\\z\\ 1 , (60) 
s 

whence \\zi\\i < sH-H^^-zHoo + K IM|i an d therefore 

(1 - /c)||zz||i - - sH-fP^zHoo < 0. 

Multiplying the latter inequality by 2 and summing up with (59), we get 

(1-2k)||s||i < 2||xj||i + 4su(H), 

and the first relation in (24). The second relation in (24) is readily given by the first one combined with 
(6). We have proved that (24) holds true whenever ^£E; since Prob{£ G H} > 1 — e, (25) follows. | 



A. 2 Proofs for sections 3, 4 
A. 2.1 Proof of Lemma 3 

(i) =^(iii): If hi satisfies (Vi), then for every x we have 

< \hfAx\ + 7||x||i < unp*(Ax) + 7||x||i, 

where the first and the second inequalities are given by {Vi.b) and (Vi.a), respectively. □ 
(iii)=>(ii): Assume that (iii) takes place; then, by homogeneity, unp*{Ax) + 7 > X{ for every x with ||x||i < 1, 
or, which is the same, the optimal value in the conic problem 

min {wip* (Ax) — Xi : llsclli < 1} 

X 

is > —7. The problem clearly is strictly feasible and bounded, so that by Conic Duality Theorem the dual 
problem is solvable with the same optimal value. Now, the dual problem reads 

max{— s : (p(h) < u), \\g\\oo < s, A T h + g = e^}, 

g,h,s 

and the fact that it is solvable with the optimal value > —7 means that there exist h, g such that (f(h) < co, 
1 1 g 1 1 00 < 7 and A T h + g = e^, whence h is a feasible solution to (P^) with the value of the objective < u. □ 

(ii) =>(i): If (P^) is feasible, it clearly is solvable; thus, in the case of (ii) there exists h with ip(h) < oo and 
||^4 T /i — ej||oo < 7. From the latter inequality it follows that \ejx — h T Ax\ < 7||x||i for every x, so that 
\xi\ — \h T Ax\ < 7||x||i for all x. We see that h satisfies (Vi), and thus (i) takes place. This reasoning shows 
also that whenever (P?) is feasible with optimal value < lj, it is solvable, and its optimal solution satisfies 
(Pi). I 



A. 2. 2 Proof of Proposition 5 

Let 7 = "y(5,k), A = r(U) + <J\j2 ln(n/e), so that what we need to prove is that there exists a matrix H 
satisfying H SiDO (7) and such that v(H) < A. Invoking Lemma 3, all we need to this end is to show that 

Vx G R n : ||x||oo < 1 \v*(Ax) +7||x||i (61) 
yl — 5 

Now, we clearly have v(h) < maxu T /i + (7i/2 ln(?i/e)||/i||2 < All/ilh for all h, whence 0*(r/) > ^\ 1 1 1 ^7 1 1 2 for all 
r\. Therefore all we need in order to justify (61) is to prove that 



Vs G R n : Halloo < ||AE|| 2 +7||a||i. (62) 

v 1 — S 



2S 



Let x G R™. Setting s = noor(fc/2), let vectors x , ...,x q be obtained from x by the procedure as follows: x 1 
is obtained by zeroing all but the s largest in magnitude entries of x; x 2 is obtained by the same procedure 
from x — x , x 3 is obtained by the same procedure from x — x 1 — x 2 , and so on, until the step q where we 
get x = x 1 + ... + x q . We clearly have Ha lloo < s 1 1 1 a^- 7 1 1 1 1 , 2 < j < q, whence also ||x J ||2 < s -1 / 2 ||a: 5 ' _1 ||i, 
2 < j < q, since the vectors x J are s-sparse. Setting ||Ax||2 = a and H-Ax 1 ^ = /3, we have 

q q 
a? = \\Ax\\ 2 \\Ax% > (AxfAx 1 = [x 1 ] T A T Ax 1 + ^[x 1 ]? A T Axi > fi 2 -^28\\x l \\ 2 \\x j \\ 2 , 

3=2 3=2 

where the last > is given by the following well-known fact: [8]: 

(!) If A is RIP (5, k) and u,v are supported on a common set of indices I of cardinality k and 
are orthogonal, then \u T A T Av\ < S\\u\\2 IMh- 

It follows that 

q q 
a/3 > (3 2 -S\\x l \\ 2 J2\\ xi h > P 2 ~ ^"^ll^lla ^ H^lli > P 2 ~ ^r^Ha^Hi. 

3=2 3=2 



Hence 



where the second inequality is due to the fact that Ijx 1 ^//? < — 5 by RIP. Thus, 

n n ^ ii in ^ P ^ a &\\ x \\i a n ii 

M °° ~ l|x 1,2 " VT=5 ~ 7T=5 + JT^5W=s * 7T=s + 7lN|l) 

where the concluding inequality is due to s > (k — l)/2 and 7 = 7(6, k). Recalling that a = ||Ax||2, (62) 
follows. ■ 



A. 2. 3 Proof of Proposition 6 

Proof. We start with analysis of 0(5,w). Let i G {l,...,n}, and let I 3 i be a subset of {l,...,n} of 
cardinality S. Let R( s ) be the linear space of all vectors from R n supported on /, and let Xr = {i£ R( s ) : 
ll^lb < R}- Assume that we are given a noisy observation y = Ax + u + cr£ of a signal z = (x, u) G (Xr,U), 
and that we want to recover from this observation the linear form Xj of the signal. From 0(S,u>) it follows 
that there exists a recovering routine such that for every x G Xr and u G IA the probability of recovering 
error to be > o> is < e. Assuming e < 1/16 and applying the celebrated result of Donoho [9], there exists a 
linear estimate (jr^y such that for every x G Xr and u £lA the probability for the error of this estimate to 
be > 1.22a; is < e. Moreover (cf. Proposition 4.2 of [13]), one can pick <pR such that 

/ Prob{^[« + ^ + Ajx]-Zi> 1.22a;} < e/2, (a) 
fc ft ^ ' \ Prob{^[u + (xe + A/x] -Xi < -1.22a;} < e/2, (6) 

where Aj is the matrix obtained from A by zeroing columns with indexes not belonging to /. Let p(R) = 
max|<^u| and r(R) = \\AJ(/)r — e» 1 1 2 5 where ej is the i-th. basic orth (so that X 2 — C« X ). Specifying the 

vector from Xr such that x t (AJ(/)r — e{) = Rr(R), and u as the vector from hi such that (j^u = p(R) (the 
required x, u clearly exist) and applying (a) to the pair (x, u) = (x, u), and (b) to the pair (x, u) = (— x), — u), 
we get 

Prob{<r<^£ > 1.22a; - Rr(R) - p(R)} < e/2, 
Prob{ff^i < -1.22a; + Rr(R)+p(R)} < e/2. 
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Hence, denoting by erfinv(e) the value of the inverse error function at e, we obtain 

erfinv (%) a\\(j> R \\ 2 < 1.22a; - Rr(R) - p(R). 



It follows that as R — > oo, 4>R remains bounded and r(R) = \\ei — AJ^r^ — > 0. Thus, there exists a sequence 
Rk — > +00, of values of R such that (f)R, goes to a limit (f> as k — > 00, and this limit satisfies the relations 

erfinv &\\4>h - l-22o;, and = e;. 

Taking into account that erfinv (|) > 0.92ydn"(l7e) when e < 1/16, we arrive at the following result: 

Lemma 6 Under assumption 0(S,uj), for every i < n and every S-element subset I B i of {1, ...,n} there 
exists 4> G R m such that (ft^ai = 1, (ft 1 a j = for all j 6 I j 7^ i (here a\, a n are the columns of A), and 



max \u T 4>\ + ay/\n{l/e)\\<f)\\2 < y/2u. (63) 



We claim that in this case for all x £ R n it holds: 



IMloo < uu*(Ax) + — 7== . . INK- (64) 
a\/2S m(l/e) 

Taking this claim for granted, and invoking Lemma 3, we immediately arrive at the desired conclusion. 
Indeed, given s satisfying (32), we have ^ > 7, so that (64) implies that 

Vx G R™ : llxlloo < ojv*(Ax) H IHIi, 

4s 

whence, by Lemma 3, there exists H satisfying the condition H s oo (i) and such that v{H) < ui, which is 
exactly what Proposition 6 states. 

It remains to prove (64). Let us fix x E R™, and let / be set of indices of the S largest in magnitude 
entries in x. Denoting by J the complement of / in {1, ...,n}, we have ||xj||oo < S~ whence 

l|xj||a < ||^||L /2 ||^lli /2 < ^ll^llPll^Hf < Is-^Mk. (65) 

Let z* £ / be the index of the largest in magnitude entry of x. By Lemma 6 there exists 4> € R m satisfying 
(63) and such that (ft 1 'oj, = sign(xj >fc ), 4> T ai = for i G We have 



v((ft) = u ea i(((j)) = maxu <p + cry/ 2 ln(n/e 



< yf2[l + \n{n)/m(l/e) 



max. u T (j) + cr\/ln(l/e 



< 2y/l + ln(n)/ln(l/e) 



(66) 



u<3J 

where the concluding < is given by (63). Now, 

^(</>)i^(Ac) > cp T Ax = ft 1 Axj + (f) T Axj = \xi t \ + (j) T Axj = ||x||oo + <ft Axj 

> Hoc - ll^lbPxjUa > Halloo - IHI2PIIIMI2 > \\x\\oc - ^\\A\\S~ 1/2 



2 F 1) 



with the concluding < given by (65). The resulting inequality, in view of (66) and the bound \\4>\\2 < — 



given by (63) implies (64). | 
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A. 3 Proofs for section 5 
A. 3.1 Proof of Lemma 4 

Recall (cf., e.g., Theorem 2.1 in [I I]) that a necessary and sufficient condition for an m x n matrix A to be 
s-good is the nullspace property as follows: there exists k < 1/2 such that 

IMUi < «||sc||i V(x G R n , Ax = 0). (67) 

Assume that this condition is satisfied, and let u G R n be a vector with s nonzero coordinates, equal to ±1. 
(67) says that the optimal value in the Linear Programming problem 

max \u T x : Ax = 0, \\x\\i < l} 

X 

is at most n; passing to the dual problem, we conclude that there exist h u and g u such that A T h u + g u = u 
and 1 1 5u | |oo < K , whence for every x G R n it holds 

u T x = x T A T h u + g^x < ||/iu||i||Ae||oo + K IMIi- 

Since the set U of the outlined vectors u is finite, the quantity L = max is finite, and 

ueu 

\\x\\ s i = maxu T x < LllAxlloo + k||x||i Vx, 

U&J 

meaning that the condition H Sj i(fc) holds true for H = [LI m , mxn _ m ]. Vice versa, the existence of k < 1/2 
and H satisfying H s ^(k) clearly implies the validity of (67) with the same k and this implies the s-goodness 
of A. ■ 



A. 3. 2 Proof of Lemma 5 

Let A satisfy RIP(<5, 2s) with 8 < 1; we want to prove that then the matrix j^A satisfies the condition 
Hs^xr^)- Indeed, let x G R n . Let vectors X . X - • • • * X q be obtained from X clS follows: x 1 is obtained by 
zeroing all but the s largest in magnitude entries of x and keeping the latter entries intact, then x 2 is obtained 
by applying the same procedure to x — x , and so on. We stop at step q where we get x = x 1 + ... + x q . 
Observe that II 

2^'||oo ^ ^11"^' ^i|i> whence also 1 1 a?-^ 1 1 2 ^ \\%^ ^111^ ^/ 2 (since x^ is s-sparse). We now have 



V^ll^lbP^Hoo > H^lliP^xlloo > [x 1 ] T A T Ax = [x 1 ] T A T Ax 1 + ^2[x l } T A T Ax j 

3=2 

> (i-s^wl-sJ^^hWxPh (*) 



i=2 

> (1 _ S )\\x 2 \\ 2 2 - Ss- 1 ' 2 ll^lbll^'- 1 !!! > (1 - 6)\\x 2 \\ 2 - 5s- 1 / 2 \\x 1 \\ 2 \\x\\ 1 

11 11 ll 1 ll ^ 1 11 aT a 11 <5 11 11 
=> F|s,2 = \\X \\2 < z\\A AxWoo + r X l 

l—o l—o 

(in the above chain, step (*) is valid due to [a; 1 ] r 74 r ^4x 1 > (1 — <5) 1 1 1 1 1 2 (since A is RIP(5, 2s)) and the 
statement (!), see the proof of Proposition 5). The concluding inequality in the chain says that jz^A satisfies 

H s>2 (t^)- I 
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A. 3. 3 Proof of Proposition 7 

We present here the proof of (i), which is a straightforward modification of the proof of Proposition 2. The 
proof of (ii) can be obtained by equally straightforward modification of the proof of Proposition 4. 

Thus, suppose we are under the premise of (i), and let 3 be defined exactly as in the proof of Proposition 
1, so that Prob{£ G 3} > 1 — e and |(er£ + u) T hi\ < u ea -u(hi) < v{H) for all £ G 5, u G and all i. Let us 
fix x G R", £ G 3 and u G U, let / be the set of indices of the s largest in magnitude entries in x, and let 
fj = <t£ -\- u, y = Ax + T],x = x Teg (y), and z = x — x. 

Since £ G 3 and u G U, we have \hf(Ax — y)\ < v% < p% for all i, whence x is a feasible solution to the 
optimization problem defining x, whence, exactly as in the proof of Proposition 1, 

(a) \\z\\i < 2||z/||i + 2||xj||i, 

(b) \hjAz\ < pi + Vi, 1 <i <n 

WEFAzWoo <p + oj. (68) 

Now, H satisfies the condition H s q (n) and thus satisfies the condition H s Applying the latter condition, 
we get 

\\zi\\i < s\\H Az\\oo + ^Iklli- 

Invoking (68), we conclude that 

||z||i < 2s||i? T ^|| 00 + 2/c||z||i + 2||xj||i < 2s[p + Q + s _1 ||x,/||i] + 2«||z||i, (69) 

thus 

\\z\\i<———[p + u + s- 1 \\x J \\ 1 \. (70) 
Next, H satisfies H S)9 (k), whence ||z|| s ,g < ||-f^ T ^||oo + Therefore, we get from (68): 

\\ z \\ s ,q — s 7 * W^AzWoo + ks^~ 1 \\z\\i < s~q [p + Q] + 2ks* ^——7—^ —^—^ (71) 

ip + u: + 2ks~ 1 \\x j \\i ip + a} + s _1 ||xj||i 

< si < si . 72 

1-2/c 1-2/e V 7 

All we need in order to extract (i) from (69) and (72) is to verify that 

p + Q + s~ 1 ||xj||i 



1 < P < q => \\z\\ p < {3s) 1/ 



1 - 2k 



The desired inequality holds true when p = 1 (see (69)), thus, invoking the Holder inequality, all we need is 
to verify that 

\\z\\ q < {'isf^O. (73) 

When q = oo, (73) is implied by (72), so let us assume that q < oo. Let A be the (s + l)-st largest of the 
magnitudes of entries in z. By (72) we have X q s < \\z\\i >q < s9 q , and A < 6. Hence, setting z' = z — z s , we 
get 

\\z'\\ q q < A 9 " 1 !!/]!! < ^" 1 ||z|| 1 < 9 q - l 2s6, 

where the concluding inequality is given by (69). Thus, \\z'\\ q q < 2s9 q , while \\z s \\q < s6 q by (72). We see 
that ||z||g < \\z s \\q + \\z'\\q < 3s9 q , as required in (73). | 
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A. 3. 4 Proof of Proposition 8 

(i): Let S = {£ G R m : |£ T a;| < £>, 1 < i < n}, so that Prob{£ G E} > 1 - e. Let us fix £ G S and 
x G R n , and let y = Ax + er£, x = xos(y)- We have ||^4 T (Ac — y)||oo = H^o^Hoo < £ < P> so that x is a 
feasible solution to the optimization problem specifying xds(?/) an d therefore ||x||i < ||x||i- Denoting by / 
the support of x s , setting z = x — z and acting exactly as when deriving (48), we arrive at 



Further, 



and therefore 



\A T Az\\ c 



\\z\\i < 2||z/||i + 2||xj||i. 
\A T (Ax-y + aO\\oo < P T ^IU + \\A T {Ax 



(74) 



< P + Q, 



\\Az\\ 2 2 = z T A T Az < HzlliP^lloo < (p + e )||z||i. 
On the other hand, by (37) we have 

1—— -"^ ^1/9 1/2 

ll^/Hl < ||z|| S) l < S i \\z\\ St q < s\\\Az\\2 + K\\z\\l < s\(p + Q) ' \\z\\i + k\\z\\i. 

Substituting the above bound into (74), we get 



(75) 



\z\\\ < 2«||2||i + 2sX(p + g) 



+ 2 ||xj||i, 



whence by elementary calculations 



r : = 



,1/2 , . ,i/2 . 2sAVp + Q 2||xj||i 

i <a + o' , where a = 1 , b = 

11 ~ ' l-2« 1-2k 



(76) 



Invoking (37) and (75), we have 



< S3 AIIAzlh + KS1 \\z\\i < Si 



-1 



s\^/p~+~g\ 



1 1/2 



+ K 2 1 



i-1 



(1 - 2k) 



ar + act 



1 - 2k 



1 - 2k 



o6 l/2 + K [ a + ftl/2]5 



< S9 



y + 



1 + 2k 



aft 1 / 2 + k6 



< 



S" 



■[a + 6 1 / 2 ] 2 , 



(77) 



where the last inequality of the chain is due to k < \. Assuming for a moment that 1 < q < oo and 
denoting by // the (s + l)-st largest magnitude of entries in z, we conclude from the latter inequality that 
^< ^[a+6 1 / 2 ] 2 . Hence, when setting z' = z — z s we obtain (cf. the verification of (73)) \\z'\\q < /i 9-1 ^'!!! < 
(2s) 1 -'?[a + 6 1 /2]2< ? _ i nvo ki n g (77) 

one more time we get 



\z\\l<\\zT q + \\z'\\l<^2s 



1 1 — «r 



+ fe 1 /2]2 ? _ 



The resulting inequality combines with (76) and the Holder inequality to imply that 



\ z Wp ^ 



3p sp~ 



■[a + b 1/2 ] 2 < 3psp 1 [a 2 +b], 1 < p < q. 



(78) 



Note that the derivation of (78) was carried out under the additional assumption that 1 < q < 00. This 
assumption can now be removed: when q = 1, (78) is readily given by (76). When q = 00, A satisfies (37) 
for q = 00 and thus - for every value of q from [1, 00], meaning that (78) holds true for every q < 00, whence 
(78) holds true for q = 00 as well. 

Recalling that relation (78) is valid whenever £ G E and x G R n and plugging in the values of a and b, 
we arrive at (40). (i) is proved. 
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(ii): Same as above, let 3 = {£ : \af£\ < A/21n(n/e)||aj|| 2 , 1 < i < n}, so that Prob{£ 3} < e. Let us 
fix x G R n , £ G 3, and let / be the support of x s . Let also y = Ax + cr£, x = xi asso (y), z = x — x. We have 

p||i + x\\Ax - y\\ 2 2 < \\x\\i + x\\Ax - y\\ 2 2 < \\x\\i + a 2 x£ T £, 

or, which is the same due to Ax — y = Az — <j£, 

p||i + x\\Az\\l - 2ax( r Az < p||i. 

It follows that 

> p||i - p||i + x(||Az||! - 2o£ r Az) 

= (p 7 + z/||i - p/||i) + (pj + zj\\i - pj||i) + x{\\Az\\ 2 - 2afAz) 

> -P/Hi + Pj||i - 2Pj||! + x(\\Az\\ 2 2 - 2a( r Az) 

> -p/||i + pj||i - 2Pj||i - 2x^p||i + x\\Az\\l, 



where the last > is readily given by the fact that ||^4 T c^Hoo < g for £ G 3. We conclude that 



PjIIi < \\zi\\i + 2^||z||i - x\\Az\\ 2 2 + 2Pj|| 



i- 



and therefore 
Now, we have 



k||i < 2p 7 ||i + 2^P||i - x\\Az\\l + 2Pj||i. 



(79) 



i_i 

P/lll < S 1 \\z\\ S: g < sA||Az|| 2 + ftp||i, 

where the concluding inequality is given by (37). Combining the resulting inequality with (79), we get 

p||i(l -2k- 2kq) < 2sA||Az|| 2 - x\\Az\\ 2 2 + 2pj||i. 
Combining this inequality with (79), we get the first inequality in the following chain: 

p||i < 2(k + kq)\\z\\i + 2\\xj\\ 1 + kU\\Az\\ 2 S ^-\\Az\\1\ (80) 



2 \2 



< 2(k + x£ ? )P||i + 2Pj||i + 



s 2 X 



and since 2k + 2xg < 1, we arrive at 

p||i < a : 



1 



2 \2 



S Z X 



1 - 2(k + xq) 

Since 2(k + qk) < 1, the first inequality in (80) is possible only if 



+ 2pj|h 



(81) 



whence 



\\Az\\ 2 - 2 A\Az h - 2 -tAl 

x 

P,||2<^+ IMl 



<o, 



(82) 



x sX 

-— i r ^ i 

Invoking (37), we get p|| Sig < si sX\\Az\\2 + «p||i , which combines with (82) and (81) to imply that 



\z\\s,q < 2s<? 1 a. 



(83) 
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Denoting by \i the (s + l)-st largest of the magnitudes of entries in z, we conclude from (83) that \x < 2s 1 a, 
whence, denoting z' = z — z s , 

\\z \\ q < \i i ||^ ||i < (2s _1 a) 9 Hzll^ <2 i si a, 
(we have used (81)), which combines with (83) to imply that 

\\z\\ q < 4si _1 a. (84) 
Combining (84), (81) and the Holder inequality, we get 

1 < v < Q => Ikllp < 4sF -1 a. (85) 

Plugging in the value of a (see (81)) and recalling that (85) takes place whenever £ G 3 with Prob{£ G 3} > 
1 — e, we arrive at (41). | 



A. 4 Proof of Proposition 9 

The proof below follows the lines of the proof of Proposition 7 of [15]. Given e G (0, 1), let 3 = {£ : \hj£\ < 
-y/2m(n/e)||/ij||2, 1 < i < w}, so that Prob{£ € 3} > 1 — e. Let us fix £ € 3, x G R n such that ||x — x s ||i < u, 
and u £U. For rj = y — Ax = u + cr£, by the definition (4) of the norm v and because of z/(/ij) < v{H), we 
have ||if T r/||oo < z^(-ff) < ^(7). 

We intend to prove the relations (a^), (bj.) by induction in k. First, let us show that (a&-i, bk-i) implies 
(ak,bk)- Thus, assume that (ak-i,bk-i) holds true. Let z( fc_1 ) = x — v^ k ~ l \ By (a^-i), z^ -1 * 1 is supported 
on the support of x. Note that 

*(*-!)_„ = x-^ fe - 1 )-i2' T (y-W fc - 1 )) = (I- J ff T A)(x-^ fe - 1 ))- J ff T 7 ? 
= (I-H T A)z^ -H T rj, 

Then by (46) for any 1 < i < n, 



consequently, 



7 : = -7«fc-i - ^(7) < zf l) -ui< 7a fc _i + 04(7) := 7, (86) 



so that the segment Sj = [uj — 7, + 7] of the width I = 270^-1 + 2a;* (7), covers z^ k ^, and the closest 
to zero point of this interval is 



A, 



[lH-i\+, Ui>0, 
-[|ui|-7] + , Ui < 0, 



that is, Aj = Aj for all Since the segment Si covers z^ k ^ and Aj is the closest to point in Si, while 
the width of Si is at most £, we clearly have 

(a) Ai G Conv{o,4 fe-1) }, (6) |zf _1) - A^ < (87) 

Since (afc_i) is valid, (87. a) implies that 



(A:) (fc-i) . A , 
v\ =v\ + Aj G 



(fc-i) 



+ Conv jo, Xi - uf 1} } C Conv{0, xj, 
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and (cifc) holds. Further, let I be the support of x s . Relation (afc) clearly implies that \ < \xi\, and 
can write due to (87. b): 

\\x - v^h =J2\*i- K (fc_1) + A*]| + \4 k) \ < E ^ " A *l + E W <sl + v = a k . 

i€l igl iel igl 



Since by (87.b) 



X — Hoc = \\x - 



t*- 1 ) - A||oo < t = 27a fc -i + 2w*(7), 



we conclude that (bf.) holds true. The induction step is justified. 

It remains to show that (ao,&o) holds true. Since (ao) is evident, all we need is to justify (bo). Let 

a* = \\x\\i, 

and let u = H T y. Same as above (cf. (86)), we have for all i: 

\xi - Ui\ < ja* +w*(7). 

Then 

a* = E + E - E[l n *l + + + 17 — W u \\sA + s 7«* + sw*(7) + u. 

ie/ i^j iei 

Hence 

||«|Ll + SW*(7) +1? 

a* < a = = , 

1 — S7 

which implies (bo)- 
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